Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add bidirectional packet capture #6882

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

AryanBakliwal
Copy link

@AryanBakliwal AryanBakliwal commented Dec 23, 2024

fixes: #6862

Added bidirectional field in packet capture CR spec.
For testing, I created two pods and pinged one from the other.

Screenshot of the .pcapng output file.
image

Signed-off-by: Aryan Bakliwal <[email protected]>
Signed-off-by: Aryan Bakliwal <[email protected]>
@AryanBakliwal
Copy link
Author

@hangyan please let me know what you think about this

dependabot bot and others added 3 commits January 1, 2025 18:39
…antrea-io#6877)

Bumps the golang-org-x group with 1 update: [golang.org/x/net](https://github.com/golang/net).


Updates `golang.org/x/net` from 0.32.0 to 0.33.0
- [Commits](golang/net@v0.32.0...v0.33.0)

---
updated-dependencies:
- dependency-name: golang.org/x/net
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: golang-org-x
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…oup (antrea-io#6880)

Bumps the ginkgo group with 1 update: [github.com/onsi/ginkgo/v2](https://github.com/onsi/ginkgo).


Updates `github.com/onsi/ginkgo/v2` from 2.22.0 to 2.22.1
- [Release notes](https://github.com/onsi/ginkgo/releases)
- [Changelog](https://github.com/onsi/ginkgo/blob/master/CHANGELOG.md)
- [Commits](onsi/ginkgo@v2.22.0...v2.22.1)

---
updated-dependencies:
- dependency-name: github.com/onsi/ginkgo/v2
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: ginkgo
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Aryan Bakliwal <[email protected]>
@AryanBakliwal
Copy link
Author

Hey @hangyan @antoninbas, I made the changes in the bpf code according to the one generated by tcpdump.

When I try to test the bidirectional packet capture, it fails with this log

E1229 11:44:58.723258       1 packetcapture_controller.go:424] "PacketCapture failed capturing packets" err="invalid argument" name="pc-test"

Could you please take a look and help me identify what might be going wrong?

Also, golangci-lint is giving me this error even though I have changed the Capture method everywhere.

===> Running golangci (linux) <===
===> Running golangci (windows) <===
cmd/antrea-agent/agent.go:64:2: could not import antrea.io/antrea/pkg/agent/packetcapture (-: # antrea.io/antrea/pkg/agent/packetcapture
pkg/agent/packetcapture/packetcapture_controller.go:170:23: cannot use capture (variable of type *capture.pcapCapture) as PacketCapturer value in assignment: *capture.pcapCapture does not implement PacketCapturer (wrong type for method Capture)
                have Capture(context.Context, string, int, "net".IP, "net".IP, *"antrea.io/antrea/pkg/apis/crd/v1alpha1".Packet) (chan gopacket.Packet, error)
                want Capture(context.Context, string, int, "net".IP, "net".IP, *"antrea.io/antrea/pkg/apis/crd/v1alpha1".Packet, bool) (chan gopacket.Packet, error)) (typecheck)
        "antrea.io/antrea/pkg/agent/packetcapture"
        ^
pkg/agent/packetcapture/capture_interface.go:1: : # antrea.io/antrea/pkg/agent/packetcapture [antrea.io/antrea/pkg/agent/packetcapture.test]
pkg/agent/packetcapture/packetcapture_controller.go:170:23: cannot use capture (variable of type *capture.pcapCapture) as PacketCapturer value in assignment: *capture.pcapCapture does not implement PacketCapturer (wrong type for method Capture)
                have Capture(context.Context, string, int, "net".IP, "net".IP, *"antrea.io/antrea/pkg/apis/crd/v1alpha1".Packet) (chan gopacket.Packet, error)
                want Capture(context.Context, string, int, "net".IP, "net".IP, *"antrea.io/antrea/pkg/apis/crd/v1alpha1".Packet, bool) (chan gopacket.Packet, error) (typecheck)
// Copyright 2024 Antrea Authors.
make: *** [Makefile:329: golangci] Error 1

@hangyan
Copy link
Member

hangyan commented Jan 2, 2025

Hey @hangyan @antoninbas, I made the changes in the bpf code according to the one generated by tcpdump.

When I try to test the bidirectional packet capture, it fails with this log

E1229 11:44:58.723258       1 packetcapture_controller.go:424] "PacketCapture failed capturing packets" err="invalid argument" name="pc-test"

Could you please take a look and help me identify what might be going wrong?

Also, golangci-lint is giving me this error even though I have changed the Capture method everywhere.

===> Running golangci (linux) <===
===> Running golangci (windows) <===
cmd/antrea-agent/agent.go:64:2: could not import antrea.io/antrea/pkg/agent/packetcapture (-: # antrea.io/antrea/pkg/agent/packetcapture
pkg/agent/packetcapture/packetcapture_controller.go:170:23: cannot use capture (variable of type *capture.pcapCapture) as PacketCapturer value in assignment: *capture.pcapCapture does not implement PacketCapturer (wrong type for method Capture)
                have Capture(context.Context, string, int, "net".IP, "net".IP, *"antrea.io/antrea/pkg/apis/crd/v1alpha1".Packet) (chan gopacket.Packet, error)
                want Capture(context.Context, string, int, "net".IP, "net".IP, *"antrea.io/antrea/pkg/apis/crd/v1alpha1".Packet, bool) (chan gopacket.Packet, error)) (typecheck)
        "antrea.io/antrea/pkg/agent/packetcapture"
        ^
pkg/agent/packetcapture/capture_interface.go:1: : # antrea.io/antrea/pkg/agent/packetcapture [antrea.io/antrea/pkg/agent/packetcapture.test]
pkg/agent/packetcapture/packetcapture_controller.go:170:23: cannot use capture (variable of type *capture.pcapCapture) as PacketCapturer value in assignment: *capture.pcapCapture does not implement PacketCapturer (wrong type for method Capture)
                have Capture(context.Context, string, int, "net".IP, "net".IP, *"antrea.io/antrea/pkg/apis/crd/v1alpha1".Packet) (chan gopacket.Packet, error)
                want Capture(context.Context, string, int, "net".IP, "net".IP, *"antrea.io/antrea/pkg/apis/crd/v1alpha1".Packet, bool) (chan gopacket.Packet, error) (typecheck)
// Copyright 2024 Antrea Authors.
make: *** [Makefile:329: golangci] Error 1

this error was reported by golangci on windows, we have a pcap_unsupported.go for windows/... interface, it's an non-op since we didn't support this on windows yet, you should change that too.

@AryanBakliwal
Copy link
Author

I’ve added the direction field with enum values (SourceToDestination, DestinationToSource, and Both) and set the default to SourceToDestination. Additionally, I’ve updated the YAML manifests, packet capture guide and tests accordingly.

Please let me know if any adjustments are needed or if there's anything else I should update or add.

Captured packets

ICMP (SourceToDestination)
Screenshot from 2025-01-06 00-07-02

ICMP (DestinationToSource)
Screenshot from 2025-01-06 00-14-28

ICMP (Both)
Screenshot from 2025-01-06 00-17-58

TCP (SourceToDestination)
image

TCP (DestinationToSource)
image

TCP (Both)
image

pkg/agent/packetcapture/capture/bpf.go Outdated Show resolved Hide resolved
pkg/agent/packetcapture/capture/bpf.go Outdated Show resolved Hide resolved
pkg/agent/packetcapture/capture/bpf_test.go Show resolved Hide resolved
pkg/apis/crd/v1alpha1/types.go Show resolved Hide resolved
Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we add an e2e test with the Both direction?

pkg/agent/packetcapture/capture/bpf.go Show resolved Hide resolved
pkg/apis/crd/v1alpha1/types.go Outdated Show resolved Hide resolved
build/yamls/antrea.yml Show resolved Hide resolved
pkg/apis/crd/v1alpha1/types.go Show resolved Hide resolved
Signed-off-by: Aryan Bakliwal <[email protected]>
@AryanBakliwal
Copy link
Author

@antoninbas @hangyan I have refactored the compilePacketFilter function, added a test case for dstPort and Both in bpf_test.go, generated the manifests by running make manifest and added an e2e test with the Both direction.
Could you please review the changes? Let me know if there’s anything else I can improve.

Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for your ongoing work on this

pkg/agent/packetcapture/capture/bpf.go Outdated Show resolved Hide resolved
pkg/agent/packetcapture/capture/bpf.go Outdated Show resolved Hide resolved
pkg/agent/packetcapture/capture/bpf.go Outdated Show resolved Hide resolved
pkg/agent/packetcapture/capture/bpf.go Outdated Show resolved Hide resolved
pkg/agent/packetcapture/capture/bpf.go Outdated Show resolved Hide resolved
pkg/agent/packetcapture/capture/bpf.go Outdated Show resolved Hide resolved
// (004) ld [26] # Load 4B at 26 (source address)
// (005) jeq #0xaf40102 jt 6 jf 15 # If bytes match(10.244.0.2), goto #6, else #15
// (006) ld [30] # Load 4B at 30 (dest address)
// (007) jeq #0xaf40103 jt 8 jf 26 # If bytes match(10.244.0.3), goto #8, else #26
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would the code be easier to generate if we jumped to (015) in case this instruction (007) was not a match?

the fact that we know we can skip the second or operand in this case is an optimization. Not saying it's not a good optimization, but if using a "fixed" offset makes code generation much easier, maybe we should do it for now.

I'm thinking about the discussion we just had during the community meeting. That being said, I can see why generating the same code as tcpdump is valuable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AryanBakliwal Please take a look at this ^

Taking a concrete example:

tcpdump -d "ip proto 6 and ((src host 10.244.1.2 and dst host 10.244.1.3) or (src host 10.244.1.3 and dst host 10.244.1.2))"

This is the code generated by tcpdump:

(000) ldh      [12]
(001) jeq      #0x800           jt 2	jf 12
(002) ldb      [23]
(003) jeq      #0x6             jt 4	jf 12
(004) ld       [26]
(005) jeq      #0xaf40102       jt 6	jf 8
(006) ld       [30]
(007) jeq      #0xaf40103       jt 11	jf 12
(008) jeq      #0xaf40103       jt 9	jf 12
(009) ld       [30]
(010) jeq      #0xaf40102       jt 11	jf 12
(011) ret      #262144
(012) ret      #0

If we generate this code instead, I believe it will be simpler:

(000) ldh      [12]
(001) jeq      #0x800           jt 2	jf 12
(002) ldb      [23]
(003) jeq      #0x6             jt 4	jf 12
(004) ld       [26]
(005) jeq      #0xaf40102       jt 6	jf 8
(006) ld       [30]
(007) jeq      #0xaf40103       jt 11	jf 8   <- less efficient than tcpdump's version but easier to implement?
(008) jeq      #0xaf40103       jt 9	jf 12
(009) ld       [30]
(010) jeq      #0xaf40102       jt 11	jf 12
(011) ret      #262144
(012) ret      #0

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@antoninbas Actually, the real problem is the calculation of skipTrue (for which I created the function calculateSkipTrue) when we want to accept the packet. But calculateSkipTrue depends on many parameters [srcPort, dstPort, direction, which port is loaded in the register (srcPort or dstPort) and if we are calling this function from the first half (instructions that check request) or the second half (instructions that check response)] to calculate the number of instructions to skip to jump to the returnKeep instruction. This makes the code generation very complex if generating instruction set exactly like tcpdump.

Taking each case:

  1. ICMP (both srcPort and dstPort are 0)
tcpdump -d "ip proto 6 and ((src host 10.244.1.2 and dst host 10.244.1.3) or (src host 10.244.1.3 and dst host 10.244.1.2))"

Code generated by tcpdump:

(000) ldh      [12]
(001) jeq      #0x800           jt 2	jf 12
(002) ldb      [23]
(003) jeq      #0x6             jt 4	jf 12
(004) ld       [26]
(005) jeq      #0xaf40102       jt 6	jf 8
(006) ld       [30]
(007) jeq      #0xaf40103       jt 11	jf 12    <- need to jump to 11 from 7, difference to calculate = 3
(008) jeq      #0xaf40103       jt 9	jf 12
(009) ld       [30]
(010) jeq      #0xaf40102       jt 11	jf 12    <- skipTrue = 0
(011) ret      #262144
(012) ret      #0
  1. TCP (srcPort = 0 and dstPort > 0)
tcpdump -d "ip proto 6 and ((src host 10.244.1.2 and dst host 10.244.1.3 and dst port 80) or (src host 10.244.1.3 and dst host 10.244.1.2 and src port 80))"

Code generated by tcpdump:

(000) ldh      [12]
(001) jeq      #0x800           jt 2	jf 22
(002) ldb      [23]
(003) jeq      #0x6             jt 4	jf 22
(004) ld       [26]
(005) jeq      #0xaf40102       jt 6	jf 13
(006) ld       [30]
(007) jeq      #0xaf40103       jt 8	jf 22    <- jump to next instruction since port(s) not zero
(008) ldh      [20]
(009) jset     #0x1fff          jt 22	jf 10
(010) ldxb     4*([14]&0xf)
(011) ldh      [x + 16]
(012) jeq      #0x50            jt 21	jf 22    <- need to jump to 21 from 12, difference to calculate = 8
(013) jeq      #0xaf40103       jt 14	jf 22
(014) ld       [30]
(015) jeq      #0xaf40102       jt 16	jf 22
(016) ldh      [20]
(017) jset     #0x1fff          jt 22	jf 18
(018) ldxb     4*([14]&0xf)
(019) ldh      [x + 14]
(020) jeq      #0x50            jt 21	jf 22    <- skipTrue = 0
(021) ret      #262144
(022) ret      #0								 

Note that this is similar to when (srcPort > 0 and dstPort = 0) and (srcPort > 0 and dstPort > 0).

Since the problem is to jump to the returnKeep instruction, why not add it again to mitigate this complex calculation and jump to it easily. For example:

  1. ICMP:
    Code that can be generated more simply
(000) ldh      [12]
(001) jeq      #0x800           jt 2	jf 12
(002) ldb      [23]
(003) jeq      #0x6             jt 4	jf 12
(004) ld       [26]
(005) jeq      #0xaf40102       jt 6	jf 8
(006) ld       [30]
(007) jeq      #0xaf40103       jt 8	jf 13    <- just skip to next instruction if true
(008) ret      #262144
(009) jeq      #0xaf40103       jt 10	jf 13
(010) ld       [30]
(011) jeq      #0xaf40102       jt 12	jf 13
(012) ret      #262144
(013) ret      #0
  1. TCP (srcPort = 0 and dstPort > 0)
    Code that can be generated more simply
(000) ldh      [12]
(001) jeq      #0x800           jt 2	jf 23
(002) ldb      [23]
(003) jeq      #0x6             jt 4	jf 23
(004) ld       [26]
(005) jeq      #0xaf40102       jt 6	jf 13
(006) ld       [30]
(007) jeq      #0xaf40103       jt 8	jf 23
(008) ldh      [20]
(009) jset     #0x1fff          jt 23	jf 10
(010) ldxb     4*([14]&0xf)
(011) ldh      [x + 16]
(012) jeq      #0x50            jt 13	jf 23    <- just skip to next instruction if true
(013) ret      #262144
(014) jeq      #0xaf40103       jt 15	jf 23
(015) ld       [30]
(016) jeq      #0xaf40102       jt 17	jf 23
(017) ldh      [20]
(018) jset     #0x1fff          jt 23	jf 19
(019) ldxb     4*([14]&0xf)
(020) ldh      [x + 14]
(021) jeq      #0x50            jt 22	jf 23
(022) ret      #262144
(023) ret      #0								 

Let me know what you think about this approach

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding an extra return instruction sounds fine to me.

I think what we are going for is some code that looks like this in the end:

if direction == SourceToDest {
        generateFilters(offset, srcIP, dstIP, srcPort, dstPort)
} else if direction == DestToSource {
        generateFilters(offset, dstIP, srcIP, dstPort, srcPort)
} else {
        generateFilters(offset, srcIP, dstIP, srcPort, dstPort)
        // ...
        generateFilters(offset, dstIP, srcIP, dstPort, srcPort)
}

So if we can get there and have some generate code that looks reasonable and works, that would be good.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@antoninbas I have made the requested changes and refactored the code as you suggested and as per our discussion. PTAL and let me know if there is anything else to update or add.

// (012) jeq #0x7b jt 13 jf 27 # TCP Source Port: If 123, goto #13, else #26
// (013) ldh [x + 16] # Load 2B at x+16 (TCP dst port)
// (014) jeq #0x7c jt 15 jf 27 # TCP dst port: If 123, goto #15, else #26
// (015) ret #262144 # MATCH
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a stupid question, but if this is the only change we have, it doesn't seem worth it? Why can't we calculate the jump size using size - curOffset - 1 given that:

  1. we always compute the full size up front
  2. we always know the current offset ("where we are")
  3. the instruction before last is always a MATCH return instruction

What am I missing?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@antoninbas You're right, the jump size can indeed be calculated easily with the available information (full size and current offset).
The main challenge, however, isn't the calculation itself but rather when to perform it, meaning in which instructions do we need skipTrue set to size - curOffset - 1.
And if a helper function is used for calculation, then it depends on which instruction is calling the helper function, what is the direction and what is the value of ports. Adding this extra instruction helps simplify the instruction set generation and avoids unnecessary complexity.

The challenges I faced:

For request packets in Both direction, there are 3 cases (assuming the 3 instructions - dstIP. srcPort and srcPort comparison instructions are calling a helper function to get jump size/skipTrue value):

  • When both ports are 0, the destination IP comparison instruction performs the calculation to jump to MATCH.
  • When only one port is 0, the destination IP comparison sets skipTrue to 0, and the comparison for the defined port performs the calculation to jump to MATCH.
  • When neither port is 0, all comparisons set skipTrue to 0, except for the destination port comparison, which performs the calculation to jump to MATCH.

In SourceToDestination, DestinationToSource and response packets in Both, no calculations are needed, and these 3 instructions (dstIP, srcPort and dstPort comparison) jump to MATCH step with skipTrue set to 0.

The helper function that calculates the jump size or returns the value of skipTrue requires additional context which is a bunch of parameters such as the calling instruction, direction, ports, size and current offset. Adding this logic in the code makes it difficult to extend in the future if we filter traffic on more parameters and also increases complexity.

Hope that this explanation makes sense and adding an extra MATCH instruction seems reasonable. Any suggestions on how we could simplify or approach this differently? I'm open to any ideas you might have for improving the implementation of calculation of skipTrue value.

// 'ip proto 6 and ((src host 10.244.1.2 and dst host 10.244.1.3 and src port 123 and dst port 124) or (src host 10.244.1.3 and dst host 10.244.1.2 and src port 124 and dst port 123))'
// And using `tcpdump -i <device> '<filter>' -d` will generate the following BPF instructions:
// (000) ldh [12] # Load 2B at 12 (Ethertype)
// (001) jeq #0x800 jt 2 jf 26 # Ethertype: If IPv4, goto #2, else #26
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in comment section, else #26 should be else $27

return count
}

func compareIPPort(srcAddrVal, dstAddrVal uint32, size, curLen uint8, srcPort, dstPort uint16, useSkipFalse bool) []bpf.Instruction {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function name is not very straightforward, can you use a new one? we may also need some comments.

Also i think the param useSkipFalse maybe misleading, we always use skip false, the difference is skipFalse to ret #0 or skipFalse to some intermediate instructions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bi-direction capture support for PacketCapture
4 participants